statistical power
Hypothesis Testing for Differentially Private Linear Regression
The majority of our hypothesis tests are based on differentially private versions of the $F$-statistic for the general linear model framework, which are uniformly most powerful unbiased in the non-private setting. We also present another test for testing mixtures, based on the differentially private nonparametric tests of Couch, Kazan, Shi, Bray, and Groce (CCS 2019), which is especially suited for the small dataset regime. We show that the differentially private $F$-statistic converges to the asymptotic distribution of its non-private counterpart.
Computing Valid p-value for Optimal Changepoint by Selective Inference using Dynamic Programming
Although there is a vast body of literature related to methods for detecting change-points (CPs), less attention has been paid to assessing the statistical reliability of the detected CPs. In this paper, we introduce a novel method to perform statistical inference on the significance of the CPs, estimated by a Dynamic Programming (DP)-based optimal CP detection algorithm. Our main idea is to employ a Selective Inference (SI) approach---a new statistical inference framework that has recently received a lot of attention---to compute exact (non-asymptotic) valid p-values for the detected optimal CPs. Although it is well-known that SI has low statistical power because of over-conditioning, we address this drawback by introducing a novel method called parametric DP, which enables SI to be conducted with the minimum amount of conditioning, leading to high statistical power. We conduct experiments on both synthetic and real-world datasets, through which we offer evidence that our proposed method is more powerful than existing methods, has decent performance in terms of computational efficiency, and provides good results in many practical applications.
Efficient and scalable clustering of survival curves
Villanueva, Nora M., Sestelo, Marta, Meira-Machado, Luis
Survival analysis encompasses a broad range of methods for analyzing time-to-event data, with one key objective being the comparison of survival curves across groups. Traditional approaches for identifying clusters of survival curves often rely on computationally intensive bootstrap techniques to approximate the null hypothesis distribution. While effective, these methods impose significant computational burdens. In this work, we propose a novel approach that leverages the k-means and log-rank test to efficiently identify and cluster survival curves. Our method eliminates the need for computationally expensive resampling, significantly reducing processing time while maintaining statistical reliability. By systematically evaluating survival curves and determining optimal clusters, the proposed method ensures a practical and scalable alternative for large-scale survival data analysis. Through simulation studies, we demonstrate that our approach achieves results comparable to existing bootstrap-based clustering methods while dramatically improving computational efficiency. These findings suggest that the log-rank-based clustering procedure offers a viable and time-efficient solution for researchers working with multiple survival curves in medical and epidemiological studies.
- Europe > Netherlands > South Holland > Rotterdam (0.05)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > Spain > Galicia > A Coruña Province > Santiago de Compostela (0.04)
- (2 more...)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > New York (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States (0.14)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Health & Medicine > Therapeutic Area > Oncology (0.68)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.47)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > Canada > Quebec > Montreal (0.04)
Power Constrained Nonstationary Bandits with Habituation and Recovery Dynamics
Li, Fengxu, Carpenter, Stephanie M., Buman, Matthew P., Mintz, Yonatan
A common challenge for decision makers is selecting actions whose rewards are unknown and evolve over time based on prior policies. For instance, repeated use may reduce an action's effectiveness (habituation), while inactivity may restore it (recovery). These nonstationarities are captured by the Reducing or Gaining Unknown Efficacy (ROGUE) bandit framework, which models real-world settings such as behavioral health interventions. While existing algorithms can compute sublinear regret policies to optimize these settings, they may not provide sufficient exploration due to overemphasis on exploitation, limiting the ability to estimate population-level effects. This is a challenge of particular interest in micro-randomized trials (MRTs) that aid researchers in developing just-in-time adaptive interventions that have population-level effects while still providing personalized recommendations to individuals. In this paper, we first develop ROGUE-TS, a Thompson Sampling algorithm tailored to the ROGUE framework, and provide theoretical guarantees of sublinear regret. We then introduce a probability clipping procedure to balance personalization and population-level learning, with quantified trade-off that balances regret and minimum exploration probability. Validation on two MRT datasets concerning physical activity promotion and bipolar disorder treatment shows that our methods both achieve lower regret than existing approaches and maintain high statistical power through the clipping procedure without significantly increasing regret. This enables reliable detection of treatment effects while accounting for individual behavioral dynamics. For researchers designing MRTs, our framework offers practical guidance on balancing personalization with statistical validity.
- Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (2 more...)
- Research Report > Strength High (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Data Science (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Functional Analysis of Variance for Association Studies
Vsevolozhskaya, Olga A., Zaykin, Dmitri V., Greenwood, Mark C., Wei, Changshuai, Lu, Qing
While progress has been made in identifying common genetic variants associated with human diseases, for most of common complex diseases, the identified genetic variants only account for a small proportion of heritability. Challenges remain in finding additional unknown genetic variants predisposing to complex diseases. With the advance in next-generation sequencing technologies, sequencing studies have become commonplace in genetic research. The ongoing exome-sequencing and whole-genome-sequencing studies generate a massive amount of sequencing variants and allow researchers to comprehensively investigate their role in human diseases. The discovery of new disease-associated variants can be enhanced by utilizing powerful and computationally efficient statistical methods. In this paper, we propose a functional analysis of variance (FANOVA) method for testing an association of sequence variants in a genomic region with a qualitative trait. The FANOVA has a number of advantages: (1) it tests for a joint effect of gene variants, including both common and rare; (2) it fully utilizes linkage disequilibrium and genetic position information; and (3) allows for either protective or risk-increasing causal variants. Through simulations, we show that FANOVA outperform two popularly used methods - SKAT and a previously proposed method based on functional linear models (FLM), - especially if a sample size of a study is small and/or sequence variants have low to moderate effects. We conduct an empirical study by applying three methods (FANOVA, SKAT and FLM) to sequencing data from Dallas Heart Study. While SKAT and FLM respectively detected ANGPTL 4 and ANGPTL 3 associated with obesity, FANOVA was able to identify both genes associated with obesity.
- Europe > Austria > Vienna (0.14)
- North America > United States > North Carolina (0.04)
- North America > United States > New Jersey (0.04)
- (3 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.94)
Hierarchical Variable Importance with Statistical Control for Medical Data-Based Prediction
Paillard, Joseph, Collas, Antoine, Engemann, Denis A., Thirion, Bertrand
Recent advances in machine learning have greatly expanded the repertoire of predictive methods for medical imaging. However, the interpretability of complex models remains a challenge, which limits their utility in medical applications. Recently, model-agnostic methods have been proposed to measure conditional variable importance and accommodate complex non-linear models. However, they often lack power when dealing with highly correlated data, a common problem in medical imaging. We introduce Hierarchical-CPI, a model-agnostic variable importance measure that frames the inference problem as the discovery of groups of variables that are jointly predictive of the outcome. By exploring subgroups along a hierarchical tree, it remains computationally tractable, yet also enjoys explicit family-wise error rate control. Moreover, we address the issue of vanishing conditional importance under high correlation with a tree-based importance allocation mechanism. We benchmarked Hierarchical-CPI against state-of-the-art variable importance methods. Its effectiveness is demonstrated in two neuroimaging datasets: classifying dementia diagnoses from MRI data (ADNI dataset) and analyzing the Berger effect on EEG data (TDBRAIN dataset), identifying biologically plausible variables.
- Europe > Switzerland > Basel-City > Basel (0.04)
- Europe > France (0.04)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)